Background

Polycystic ovary syndrome (PCOS) is a complex condition characterized by elevated androgen levels, menstrual irregularities, and/or small cysts on one or both ovaries.

The disorder can be morphological (polycystic ovaries) or predominantly biochemical (hyperandrogenemia).

Hyperandrogenism, a clinical hallmark of PCOS, can cause inhibition of follicular development, microcysts in the ovaries, anovulation, and menstrual changes.

pdata$Oligomenorrhea <- as.factor(pdata$Oligomenorrhea)
pdata$Polycystic.Ovarian.Morphology <- as.factor(pdata$Polycystic.Ovarian.Morphology)
pdata$Hyperandrogenism <- as.factor(pdata$Hyperandrogenism)
pdata$Category <- as.factor(pdata$Category)

pdata <- mutate(pdata, Weight.Category = ifelse((BMI > 25), "Overweight", "Healthy/Low"))
pdata$Weight.Category <- as.factor(pdata$Weight.Category)

head(pdata)
##   Specimen.id. Age  BMI Hirsutism Testosterone.ng.mL Oligomenorrhea
## 1            1  34 24.8         8               0.26       No Oligo
## 2            2  23 27.4         9               0.60       No Oligo
## 3            3  26 44.0         4               0.32 Oligomenorrhea
## 4            4  32 29.2        10               0.85 Oligomenorrhea
## 5            5  20 25.8         8               0.49 Oligomenorrhea
## 6            7  26 33.4         6               0.73 Oligomenorrhea
##      Hyperandrogenism Polycystic.Ovarian.Morphology   Category Weight.Category
## 1    Hyperandrogenism                   PCO ovaries    HA+PCOM     Healthy/Low
## 2    Hyperandrogenism                   PCO ovaries    HA+PCOM      Overweight
## 3 No Hyperandrogenism                   PCO ovaries Oligo+PCOM      Overweight
## 4    Hyperandrogenism                   PCO ovaries       PCOS      Overweight
## 5    Hyperandrogenism                No PCO ovaries       PCOS      Overweight
## 6    Hyperandrogenism                   PCO ovaries       PCOS      Overweight
names(pdata) #verify all variables are present
##  [1] "Specimen.id."                  "Age"                          
##  [3] "BMI"                           "Hirsutism"                    
##  [5] "Testosterone.ng.mL"            "Oligomenorrhea"               
##  [7] "Hyperandrogenism"              "Polycystic.Ovarian.Morphology"
##  [9] "Category"                      "Weight.Category"

Hypothesis

Increasing testosterone and age are more likely to result in an elevated BMI.

ggplot(
  pdata,
  aes(
    x=Testosterone.ng.mL,
    y=BMI
    )
  ) + 
  geom_point(
    size=1) + 
  theme_minimal() +
  geom_smooth(
    formula = 'y ~ x',
    method = 'lm',
    se=FALSE) +
  labs(
    x="Testosterone (ng/mL)",
    y="BMI",
    title="Testosterone Levels v. BMI"
  )

ggsave("Testosterone-v-BMI.png", height=4.5, width = 4.5, units = "in")


ggplot(
  pdata,
  aes(
    x=Age,
    y=BMI,
    )
  ) + 
  geom_point(
    size=1) + 
  theme_minimal() +
  geom_smooth(
    formula = 'y ~ x',
    method = 'lm',
    se=FALSE) +
  labs(
    x="Age",
    y="BMI",
    title="Participant Age v. BMI"
  )

ggsave("Age-v-BMI.png", height=4.5, width = 4.5, units = "in")


ggplot(
  pdata,
  aes(
    x=Testosterone.ng.mL,
    y=BMI,
    color=Age
    )
  ) + 
  scale_colour_gradient(
  low = "red",
  high = "green",
  space = "Lab",
  na.value = "grey50",
  guide = "colourbar",
  aesthetics = "colour"
  ) +
  geom_point(
    size=1,
  ) + 
  theme_gray() +
  geom_smooth(
    formula = 'y ~ x',
    method = 'lm',
    se=FALSE) +
  labs(
    x="Testosterone (ng/mL)",
    y="BMI",
    title="Testosterone Levels v. BMI",
    subtitle="colored by age"
  )
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

ggsave("Testosterone+Age-v-BMI.png", height=4.5, width = 6, units = "in")
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Comment about the plots. Statement justifying no interaction effects between the two IVs. Modifying the hypothesis.

m1 = glm(BMI ~ Testosterone.ng.mL + Age, data=pdata)
summary(m1)
## 
## Call:
## glm(formula = BMI ~ Testosterone.ng.mL + Age, data = pdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.8367  -3.8653  -0.8624   2.1259  20.9440  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        14.58869    2.70096   5.401 2.18e-07 ***
## Testosterone.ng.mL  5.06079    2.07240   2.442  0.01562 *  
## Age                 0.26338    0.08129   3.240  0.00144 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 28.36949)
## 
##     Null deviance: 5265.0  on 174  degrees of freedom
## Residual deviance: 4879.6  on 172  degrees of freedom
## AIC: 1087
## 
## Number of Fisher Scoring iterations: 2
autoplot(m1)

Statement justifying changing model to GLM, possion (AIC, deviance are below the Df)

## 
## Call:
## glm(formula = BMI ~ Testosterone.ng.mL + Age, family = "poisson", 
##     data = pdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6277  -0.8110  -0.1694   0.4332   3.8700  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        2.793357   0.102725  27.193  < 2e-16 ***
## Testosterone.ng.mL 0.204905   0.077715   2.637 0.008374 ** 
## Age                0.010736   0.003071   3.497 0.000471 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 202.67  on 174  degrees of freedom
## Residual deviance: 186.92  on 172  degrees of freedom
## AIC: Inf
## 
## Number of Fisher Scoring iterations: 4

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter3d
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode